On Invariant Post Randomization for Statistical Disclosure Control
نویسندگان
چکیده
In this paper, we investigate certain operational and inferential aspects of invariant PRAM (post randomization method) as a tool for disclosure limitation of categorical data. Invariant PRAMs preserve unbiasedness of certain estimators, but inflate their variances and distort other attributes. We introduce the concept of strongly invariant PRAM, which does not affect data utility or the properties of any statistical method. However, the procedure seems feasible in limited situations. We review methods for constructing invariant PRAM matrices and prove that a conditional approach, which can preserve the original data on any subset of variables, is an invariant PRAM. For multinomial sampling, we derive expressions for variance inflation due to invariant PRAMing and variances of certain estimators of the cell probabilities and also their tight upper bounds. We discuss estimation of these quantities and thereby assessing statistical efficiency loss due to invariant PRAMing. We find a connection between invariant PRAM and creating partially synthetic data using a nonparametric ∗Center for Disclosure Avoidance Research, U.S. Census Bureau, Washington, DC 20233 and Department of Statistics, George Washington University, Washington, DC 20052. †U.S. Energy Information Administration, Washington, DC 20585. ‡The views expressed in this article are those of the authors and not necessarily those of the U.S. Census Bureau. The analysis and conclusions contained in this paper are those of the authors and do not represent the official position of the U.S. Energy Information Administration (EIA) or the U.S. Department of Energy (DOE).
منابع مشابه
Logistic Regression with Variables Subject to Post Randomization Method
An increase in quality and detail of publicly available databases increases the risk of disclosure of sensitive personal information contained in such databases. The goal of Statistical Disclosure Control (SDC) is to develop methodology that aims at minimizing disclosure risk while providing society with as much information as possible needed for valid statistical inference. The Post Randomizat...
متن کاملPreserving Edits When Perturbing Microdata for Statistical Disclosure Control Ntalie Shlomo, Ton De Waal
To protect individuals in microdata from the risk of re-identification, a general perturbative method called PRAM (the Post-Randomization Method) is sometimes used for masking records. This method adds “noise” to categorical variables by changing values of categories for a small number of records according to a prescribed probability matrix and a stochastic process based on the outcome of a ran...
متن کاملk-Anonymous Microdata Release via Post Randomisation Method
The problem of the release of anonymized microdata is an important topic in the fields of statistical disclosure control (SDC) and privacy preserving data publishing (PPDP), and yet it remains sufficiently unsolved. In these research fields, k-anonymity has been widely studied as an anonymity notion for mainly deterministic anonymization algorithms, and some probabilistic relaxations have been ...
متن کاملAn empirical evaluation of PRAM (Discussion paper 04012)
The views expressed in this paper are those of the authors and do not necessarily reflect the policies of Statistics Netherlands Explanation of symbols. = data not available * = provisional figure x = publication prohibited (confidential figure) – = nil or less than half of unit concerned – = (between two figures) inclusive 0 (0,0) = less than half of unit concerned blank = Due to rounding, som...
متن کاملMeasuring Identification Risk in Microdata Release and Its Control by Post-randomization
Statistical agencies often release a masked or perturbed version of survey data to protect respondents’ confidentiality. Ideally, a perturbation procedure should protect confidentiality without much loss of data quality, so that released data may practically be treated as original data for making inferences. One major objective is to control the risk of correctly identifying any respondent’s re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014